Primary analyses
H1: Are human relatedness judgments inversely related to cosine distance?
First, we ask whether participants' relatedness judgments covary in the expected way (i.e., have a negative relationship) with cosine distance measures from BERT and ELMo.
Load modeling data
Here, we merge the results from the neural language model analyses and merge it with our norming data.
df_distances = read_csv("../../data/processed/stims_processed.csv")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## X1 = col_double(),
## Class = col_character(),
## ambiguity_type = col_character(),
## ambiguity_type_mw = col_character(),
## ambiguity_type_oed = col_character(),
## different_frame = col_character(),
## distance_bert = col_double(),
## distance_elmo = col_double(),
## overlap = col_character(),
## same = col_logical(),
## source = col_character(),
## string = col_character(),
## version = col_character(),
## word = col_character()
## )
nrow(df_distances)
## [1] 690
df_merged = df_normed_critical %>%
left_join(df_distances, by = c("word", "version", "string", "overlap",
"source", "same", "Class", "ambiguity_type"))
nrow(df_merged)
## [1] 8855
length(unique(df_merged$subject))
## [1] 77
Analysis: BERT
model_bert = lmer(data = df_merged,
relatedness ~ distance_bert +
Class +
(1 + distance_bert | subject) +
(1 + distance_bert | word),
control=lmerControl(optimizer="bobyqa"),
REML = FALSE)
model_null = lmer(data = df_merged,
relatedness ~
Class +
(1 + distance_bert | subject) +
(1 +distance_bert | word),
control=lmerControl(optimizer="bobyqa"),
REML = FALSE)
anova(model_bert, model_null)
## Data: df_merged
## Models:
## model_null: relatedness ~ Class + (1 + distance_bert | subject) + (1 + distance_bert |
## model_null: word)
## model_bert: relatedness ~ distance_bert + Class + (1 + distance_bert | subject) +
## model_bert: (1 + distance_bert | word)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_null 9 28225 28289 -14104 28207
## model_bert 10 28108 28178 -14044 28088 119.76 1 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model_bert)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: relatedness ~ distance_bert + Class + (1 + distance_bert | subject) +
## (1 + distance_bert | word)
## Data: df_merged
## Control: lmerControl(optimizer = "bobyqa")
##
## AIC BIC logLik deviance df.resid
## 28107.5 28178.4 -14043.8 28087.5 8845
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.5206 -0.6433 0.0415 0.6182 3.7695
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## word (Intercept) 0.99949 0.9997
## distance_bert 32.56057 5.7062 -0.75
## subject (Intercept) 0.05077 0.2253
## distance_bert 0.56305 0.7504 -0.51
## Residual 1.26206 1.1234
## Number of obs: 8855, groups: word, 115; subject, 77
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 4.12020 0.10967 37.570
## distance_bert -8.29528 0.56482 -14.687
## ClassV 0.09571 0.14974 0.639
##
## Correlation of Fixed Effects:
## (Intr) dstnc_
## distanc_brt -0.710
## ClassV -0.329 -0.006
Analysis: ELMo
model_elmo = lmer(data = df_merged,
relatedness ~ distance_elmo +
Class +
(1 + distance_elmo | subject) +
(1 + distance_elmo | word),
control=lmerControl(optimizer="bobyqa"),
REML = FALSE)
model_null = lmer(data = df_merged,
relatedness ~
Class +
(1 + distance_elmo | subject) +
(1 + distance_elmo | word),
control=lmerControl(optimizer="bobyqa"),
REML = FALSE)
anova(model_elmo, model_null)
## Data: df_merged
## Models:
## model_null: relatedness ~ Class + (1 + distance_elmo | subject) + (1 + distance_elmo |
## model_null: word)
## model_elmo: relatedness ~ distance_elmo + Class + (1 + distance_elmo | subject) +
## model_elmo: (1 + distance_elmo | word)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_null 9 28780 28844 -14381 28762
## model_elmo 10 28664 28734 -14322 28644 118.71 1 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model_elmo)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: relatedness ~ distance_elmo + Class + (1 + distance_elmo | subject) +
## (1 + distance_elmo | word)
## Data: df_merged
## Control: lmerControl(optimizer = "bobyqa")
##
## AIC BIC logLik deviance df.resid
## 28663.6 28734.5 -14321.8 28643.6 8845
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4213 -0.6761 0.0669 0.6273 3.2055
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## word (Intercept) 1.11924 1.0579
## distance_elmo 357.28540 18.9020 -0.80
## subject (Intercept) 0.04255 0.2063
## distance_elmo 5.37455 2.3183 -0.49
## Residual 1.35192 1.1627
## Number of obs: 8855, groups: word, 115; subject, 77
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 4.3522 0.1173 37.109
## distance_elmo -27.2957 1.8750 -14.557
## ClassV 0.4004 0.1501 2.666
##
## Correlation of Fixed Effects:
## (Intr) dstnc_
## distance_lm -0.765
## ClassV -0.302 0.002
Discussion
For both cases, we find that a fixed effect of cosine distance improves model fit. This very general test confirms that cosine distance does indeed capture information about relatedness judgments. Later, we also correlate cosine distance with mean relatedness judgments by item, the more typical test used in word similarity literature.
H2: Do people judge same sense usages to be more related than different-sense usages?
As predicted, we find that pairs belonging to the same sense are judged to be more related than pairs belonging to different senses. We control for cosine distance (from both models) in this analysis.
df_merged %>%
group_by(same) %>%
summarise(mean_relatedness = mean(relatedness),
median_relatedness = median(relatedness),
sd_relatedness = sd(relatedness))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 4
## same mean_relatedness median_relatedness sd_relatedness
## <lgl> <dbl> <dbl> <dbl>
## 1 FALSE 1.37 1 1.48
## 2 TRUE 3.47 4 1.01
model_same = lmer(data = df_merged,
relatedness ~ same +
distance_bert + distance_elmo +
Class +
(1 + same | subject) +
(1 + same | word),
control=lmerControl(optimizer="bobyqa"),
REML = FALSE)
model_null = lmer(data = df_merged,
relatedness ~
distance_bert + distance_elmo +
Class +
(1 + same | subject) +
(1 + same | word),
control=lmerControl(optimizer="bobyqa"),
REML = FALSE)
anova(model_same, model_null)
## Data: df_merged
## Models:
## model_null: relatedness ~ distance_bert + distance_elmo + Class + (1 + same |
## model_null: subject) + (1 + same | word)
## model_same: relatedness ~ same + distance_bert + distance_elmo + Class +
## model_same: (1 + same | subject) + (1 + same | word)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_null 11 25217 25295 -12597 25195
## model_same 12 24999 25084 -12488 24975 219.68 1 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model_same)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: relatedness ~ same + distance_bert + distance_elmo + Class +
## (1 + same | subject) + (1 + same | word)
## Data: df_merged
## Control: lmerControl(optimizer = "bobyqa")
##
## AIC BIC logLik deviance df.resid
## 24998.9 25084.0 -12487.5 24974.9 8843
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.5243 -0.5143 0.0571 0.5469 4.0964
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## word (Intercept) 1.10563 1.0515
## sameTRUE 0.87013 0.9328 -0.95
## subject (Intercept) 0.09007 0.3001
## sameTRUE 0.14292 0.3780 -0.78
## Residual 0.88922 0.9430
## Number of obs: 8855, groups: word, 115; subject, 77
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.77281 0.12087 14.667
## sameTRUE 1.91240 0.10243 18.670
## distance_bert -0.54397 0.11815 -4.604
## distance_elmo -2.25685 0.47972 -4.704
## ClassV 0.04891 0.08554 0.572
##
## Correlation of Fixed Effects:
## (Intr) smTRUE dstnc_b dstnc_l
## sameTRUE -0.871
## distanc_brt -0.244 0.136
## distance_lm -0.350 0.171 -0.186
## ClassV -0.167 0.000 0.054 -0.052
df_merged %>%
ggplot(aes(x = relatedness)) +
geom_histogram(bins = 5) +
theme_minimal() +
facet_wrap(~same)
df_merged %>%
ggplot(aes(x = relatedness)) +
geom_histogram(bins = 5,
aes(y = (..density..))) +
theme_minimal() +
facet_wrap(~same)
df_merged %>%
ggplot(aes(x = relatedness,
color = same)) +
geom_freqpoly(bins = 5) +
# scale_x_continuous(limits = c(0, 4)) +
theme_minimal()
df_merged %>%
ggplot(aes(x = relatedness,
color = same)) +
geom_freqpoly(bins = 5,
aes(y = (..density..))) +
# scale_x_continuous(limits = c(0, 4)) +
theme_minimal()
Discussion
We find that indeed, same sense usages are judged as more related than different sense usages. In fact, Same Sense explains variance in relatedness even when cosine distance is adjusted for.
H3: Does relatedness differ as a function of ambiguity type?
Here, we want to know whether pairs categorized as homonymous as seen as less related, on average, than words categorized as polysemous.
Of course, if there is an effect of ambiguity_type, we expect it to show up primarily for different sense words. This could be modeled in one of two ways:
- Using only
different sense words, we can ask whether there's a main effect ofambiguity_type.
- Using all data, we could ask whether there's a significant interaction between
ambiguity_typeandsame sense.
(Note that we might also expect a main effect of ambiguity_type for all data when the interaction is not controlled for, simply because the different sense judgments might drag the average relatedness judgments down for those pairs.)
Analysis 1: Different-sense only
One way to model this is to ask: for different sense pairs only, does ambiguity_type predict differences in relatedness judgments? We predict that homonymous words should have lower relatedness judgments on average than polysemous words.
df_merged_diff_only = df_merged %>%
filter(same == FALSE)
df_merged_diff_only %>%
group_by(ambiguity_type) %>%
summarise(mean_relatedness = mean(relatedness),
median_relatedness = median(relatedness),
sd_relatedness = sd(relatedness))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 3 x 4
## ambiguity_type mean_relatedness median_relatedness sd_relatedness
## <chr> <dbl> <dbl> <dbl>
## 1 Homonymy 0.467 0 0.864
## 2 Polysemy 1.75 2 1.50
## 3 Unsure 3.40 4 1.02
model_at = lmer(data = df_merged_diff_only,
relatedness ~ ambiguity_type +
distance_bert + distance_elmo +
Class +
(1 + ambiguity_type | subject) +
(1 | word),
control=lmerControl(optimizer="bobyqa"),
REML = FALSE)
## boundary (singular) fit: see ?isSingular
model_null = lmer(data = df_merged_diff_only,
relatedness ~
distance_bert + distance_elmo +
Class +
(1 + ambiguity_type | subject) +
(1 | word),
control=lmerControl(optimizer="bobyqa"),
REML = FALSE)
## boundary (singular) fit: see ?isSingular
anova(model_at, model_null)
## Data: df_merged_diff_only
## Models:
## model_null: relatedness ~ distance_bert + distance_elmo + Class + (1 + ambiguity_type |
## model_null: subject) + (1 | word)
## model_at: relatedness ~ ambiguity_type + distance_bert + distance_elmo +
## model_at: Class + (1 + ambiguity_type | subject) + (1 | word)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_null 12 16613 16693 -8294.5 16589
## model_at 14 16557 16650 -8264.4 16529 60.258 2 8.225e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model_at)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: relatedness ~ ambiguity_type + distance_bert + distance_elmo +
## Class + (1 + ambiguity_type | subject) + (1 | word)
## Data: df_merged_diff_only
## Control: lmerControl(optimizer = "bobyqa")
##
## AIC BIC logLik deviance df.resid
## 16556.7 16650.1 -8264.4 16528.7 5818
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.6103 -0.5737 -0.1261 0.5394 4.0320
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## word (Intercept) 0.64016 0.8001
## subject (Intercept) 0.07156 0.2675
## ambiguity_typePolysemy 0.05723 0.2392 -0.02
## ambiguity_typeUnsure 0.14553 0.3815 -0.90 0.46
## Residual 0.89435 0.9457
## Number of obs: 5832, groups: word, 115; subject, 77
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.92467 0.15847 5.835
## ambiguity_typePolysemy 1.28308 0.16516 7.769
## ambiguity_typeUnsure 2.85890 0.48958 5.839
## distance_bert -0.69841 0.15653 -4.462
## distance_elmo -2.00497 0.61166 -3.278
## ClassV -0.06958 0.17955 -0.388
##
## Correlation of Fixed Effects:
## (Intr) ambg_P ambg_U dstnc_b dstnc_l
## ambgty_typP -0.649
## ambgty_typU -0.260 0.214
## distanc_brt -0.284 0.000 -0.008
## distance_lm -0.356 0.022 0.027 -0.103
## ClassV -0.169 -0.109 0.063 0.039 -0.125
## convergence code: 0
## boundary (singular) fit: see ?isSingular
df_merged %>%
ggplot(aes(x = relatedness)) +
geom_histogram(bins = 5) +
theme_minimal() +
facet_wrap(~same + ambiguity_type)
df_merged %>%
ggplot(aes(x = relatedness)) +
geom_histogram(bins = 5,
aes(y = ..density..)) +
theme_minimal() +
facet_wrap(~same + ambiguity_type,
ncol = 3)
df_merged %>%
ggplot(aes(x = relatedness,
color = same)) +
geom_freqpoly(bins = 5) +
theme_minimal() +
facet_wrap(~ambiguity_type, ncol = 1)
df_merged %>%
ggplot(aes(x = relatedness,
color = same)) +
geom_freqpoly(bins = 5,
aes(y = ..density..)) +
theme_minimal() +
scale_x_continuous(limits = c(0, 4)) +
facet_wrap(~ambiguity_type, ncol = 1)
## Warning: Removed 4 row(s) containing missing values (geom_path).
Analysis 2: All data
Another way to model this would be to compare a model with a interaction of same * ambiguity_type to a model with only the main effects.
If ambiguity_type matters, it should matter primarily for different sense usages. That is, the effect of ambiguity_type should change as a function of whether a given comparison involves different or same sense usages of a word.
Note that since ambiguity_type is only manipulated across words, this analysis complements Analysis 1 above, which considers only different sense words. It's conceivable that one could observe a main effect of ambiguity_type for different sense words if the stimuli chosen to be homonyms are less related overall (including same sense usages). Thus, this analysis asks whether ambiguity_type has a different relationship with relatedness as a function of same sense vs. different sense.
model_interaction = lmer(data = df_merged,
relatedness ~ same * ambiguity_type +
distance_bert + distance_elmo +
Class +
(1 + same + ambiguity_type | subject) +
(1 + same | word),
control=lmerControl(optimizer="bobyqa"),
REML = FALSE)
## boundary (singular) fit: see ?isSingular
model_both = lmer(data = df_merged,
relatedness ~ same + ambiguity_type +
distance_bert + distance_elmo +
Class +
(1 + same+ ambiguity_type | subject) +
(1 + same | word),
control=lmerControl(optimizer="bobyqa"),
REML = FALSE)
## boundary (singular) fit: see ?isSingular
summary(model_interaction)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: relatedness ~ same * ambiguity_type + distance_bert + distance_elmo +
## Class + (1 + same + ambiguity_type | subject) + (1 + same | word)
## Data: df_merged
## Control: lmerControl(optimizer = "bobyqa")
##
## AIC BIC logLik deviance df.resid
## 24904.1 25067.2 -12429.1 24858.1 8832
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.6042 -0.5058 0.0485 0.5419 3.9860
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## word (Intercept) 0.64525 0.8033
## sameTRUE 0.63725 0.7983 -0.93
## subject (Intercept) 0.07420 0.2724
## sameTRUE 0.14536 0.3813 -0.77
## ambiguity_typePolysemy 0.03200 0.1789 0.09 -0.29
## ambiguity_typeUnsure 0.07407 0.2722 -0.69 0.94 0.04
## Residual 0.87961 0.9379
## Number of obs: 8855, groups: word, 115; subject, 77
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.89642 0.14769 6.070
## sameTRUE 2.53393 0.14418 17.575
## ambiguity_typePolysemy 1.27561 0.16392 7.782
## ambiguity_typeUnsure 2.86532 0.48945 5.854
## distance_bert -0.58128 0.11709 -4.964
## distance_elmo -2.24542 0.47098 -4.768
## ClassV 0.01607 0.07363 0.218
## sameTRUE:ambiguity_typePolysemy -0.88873 0.16575 -5.362
## sameTRUE:ambiguity_typeUnsure -2.15173 0.49877 -4.314
##
## Correlation of Fixed Effects:
## (Intr) smTRUE ambg_P ambg_U dstnc_b dstnc_l ClassV sTRUE:_P
## sameTRUE -0.883
## ambgty_typP -0.717 0.677
## ambgty_typU -0.259 0.251 0.217
## distanc_brt -0.197 0.113 0.000 -0.009
## distance_lm -0.287 0.129 0.010 0.025 -0.187
## ClassV -0.079 -0.005 -0.045 0.025 0.062 -0.082
## smTRUE:mb_P 0.671 -0.764 -0.904 -0.200 -0.019 -0.012 0.003
## smTRUE:mb_U 0.237 -0.261 -0.199 -0.909 -0.032 -0.025 0.001 0.221
## convergence code: 0
## boundary (singular) fit: see ?isSingular
anova(model_interaction, model_both)
## Data: df_merged
## Models:
## model_both: relatedness ~ same + ambiguity_type + distance_bert + distance_elmo +
## model_both: Class + (1 + same + ambiguity_type | subject) + (1 + same |
## model_both: word)
## model_interaction: relatedness ~ same * ambiguity_type + distance_bert + distance_elmo +
## model_interaction: Class + (1 + same + ambiguity_type | subject) + (1 + same |
## model_interaction: word)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_both 21 24934 25083 -12446 24892
## model_interaction 23 24904 25067 -12429 24858 33.675 2 4.869e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
df_tidy = broom.mixed::tidy(model_interaction)
df_tidy %>%
filter(effect == "fixed") %>%
ggplot(aes(x = term,
y = estimate)) +
geom_point() +
coord_flip() +
geom_hline(yintercept = 0, linetype = "dotted") +
geom_errorbar(aes(ymin = estimate - 2*std.error,
ymax = estimate + 2*std.error),
width=.2,
position=position_dodge(.9)) +
labs(x = "Predictor",
y = "Estimate") +
theme_minimal()
Discussion
It appears that Ambiguity Type explains variance in relatedness above and beyond that already explained by cosine distance and same sense. In particular, different sense homonyms appear to be judged as less related, on average, than different sense polysems (which span a wider range).
These visualizations also suggest that the different-sense Unsure items behave more like same sense items from the homonymy/polysemy stimuli. For this reason, we exclude them from future analyses (and from the relatedness dataset).
nrow(df_merged)
## [1] 8855
df_merged = df_merged %>%
filter(ambiguity_type != "Unsure")
nrow(df_merged)
## [1] 8624
df_merged %>%
ggplot(aes(x = relatedness)) +
geom_histogram(bins = 5) +
theme_minimal() +
facet_wrap(~same + ambiguity_type)
df_merged %>%
ggplot(aes(x = relatedness)) +
geom_histogram(bins = 5,
aes(y = ..density..)) +
theme_minimal() +
facet_wrap(~same + ambiguity_type,
ncol = 2)
df_merged %>%
ggplot(aes(x = relatedness,
color = same)) +
geom_freqpoly(bins = 5) +
theme_minimal() +
facet_wrap(~ambiguity_type, ncol = 1)
df_merged %>%
ggplot(aes(x = relatedness,
color = same)) +
geom_freqpoly(bins = 5,
aes(y = ..density..)) +
theme_minimal() +
facet_wrap(~ambiguity_type, ncol = 1)